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Abstract 

The implementation of Bayesian predictive procedures under standard normal mod- 
els is considered. Two distributions are of particular interest, the i^-prime and K- 
square distributions. They also give exact inferences for simple and multiple correla- 
tion coefficients. Their cumulative distribution functions can be expressed in terms 
of infinite series of multiples of incomplete beta function ratios, thus adequate for 
recursive calculations. Efficient algorithms are provided. To deal with special cases 
where possible underflows may prevent recurrence to work properly, a simple so- 
lution is proposed which results in a procedure which is intermediate between two 
classes of algorithms. Some examples of applications are given. 
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1 Introduction 



Bayesian predictive probabilities give statistical users a particularly useful de- 
vice to answer essential questions such as: "how big should be the experiment 
to have a reasonable chance of demonstrating a given conclusion?" "given the 
current data, what is the chance that the final result will be in some sense con- 
clusive, or on the contrary inconclusive?" Traditional frequentist procedures 
(e.g., sample size determination via power calculation), being conditional to 

* Corresponding author. Tel: -^33- 1-5395-4322; fax: +33-1-4577-1659 

Email addresses: jacques.poitevineau@upmc.fr (Jacques Poitevineau), 
bruno.lecoutre@univ-rouen.fr (Bruno Lecoutre). 



Preprint submitted to Elsevier 



March 26, 2010 



the parameters, are carried out under a subset of paremeter values, whereas 
Bayesian predictive probabihties, which consider all possible parameter values 
(conditionally to the data in hand), give them d irect and natural answers . 
Some relevant references are Baum et a] ilQSgySgTegelhalter et al (jlOOj ). 
Johns and Andersenl fll999h lLecoutr3 fl200lh. Ihecoutrel Ssl). iBerrvl fcoosl i. 
Dmitrienko and Wang (120061 ). iGrouin et all (120071 ). In particular, from a pilot 
study, the predictive probabilities on credible limits give a useful summary to 
help in the determination of the sample size of an experiment. If the power 
approach and the predictive a pproach soraetime s result in relatively similar 



sample sizes (for instance, see llnoue et all . |2005| ). in general, the predictive 



approach requires a larger sample size. This can be considered the price to 
pay to avoid assumptions about parent effect size and variance. 



Recently, the Association for Psychological Science has recommended that 
articles published in Psychological Science and thei r other journa ls report the 
"probability of replicating an effect" , denoted prep (IKilleenl . l2005l ) rather than 
the traditional p-value. prep is defined as the predictive probability of finding 
an effect of the same sign in a replication. 



The above procedures are frequently used in the case of comparison of means 
for which the traditional procedures are the t and ANOVA F tests. For sample 
size determination, considering an unknown variance is often seen as an unnec- 
essary sophistication. However, this requires that the sample sizes to be deter- 
mined are relatively high. The probability of replication p^ep ^ such as it now 
appears in Psychological Science - and its extensions frequentl y involve small 
sample sizes, but solutions in use assume a known vari ance ( Lecoutre et all . 
20081 ). One hundred years after Student's famous article (IStudentl . llQOSi ). one 
can hardly be satisfied with this unnecessary restriction. 



The aim of this article is to contribute to implement predictive procedures 
that relax the assumption of a known variance. These procedures in volve the 
■ ft^-pri me and ii'-square distributions that have been introduced in iLecoutre 
(119841 ). They can be characterized as mixtur es of the classic al noncentral t 
and noncentral F distributions respectively (ILecoutrd . Il999l ). In particular, 
the predictive distributions of the t test statistic and the associate limits of 
interval estimates under standard normal models, assuming a conjugate prior, 
is a i^'-prime distribution. The extension to ANOVA F tests involves the K- 
square distribution. Moerover, the i^'-prime and /T-square respectively include 
as particular cases the distributions of the sample correlation coefficient and 
of the sample multiple correlation coefficient, alUowing exact inferences about 
these two coefficients. 



This article provide efficient algorithms for the calculation of the cumulative 
distribution functions (cdfs) of these distributions. These cdfs can be expressed 
in terms of infinite series of multiples of incomplete beta function ratios, thus 
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adequate for recursive calculations. More precisely, both imply the general 
form 

oo 

Y,s^g,H,{x), (1) 



with 



s = ±l, 0<(?, <lVj, Y.9i = ^ 

3=0 



and where Hj{x) involves only the incomplete beta function. 

Dealing with a related problem, the App lied Statistics algorithm AS 278 de- 



velop ed for the psi-square distribution (ILecoutre. Guigues and Poitevineau 



19921 ) could be adapted to match the present cdfs. However, AS 278 is a 



Method 1 recursive algorithm, in the terms of iBenton and Krishnamoorthy 



(120031 ): accumulation is simply done from index (which maximizes Hj{x)) 
until a convergence criterion is met. In some cases, especially when the non- 
centrality parameter of the distribution is large, it can lead to an exceedingly 
large number of iterat ions, and con sequently to unacceptable execution time 



and loss of precision. iFrickl (Il990l ) proposed an improvement that consists 
in starting iterations at an index such that the resulting truncation error is 
negligible, but this does not solve the problem. 



Yet, th e present cdfs are of the general class considered by lBenton and Krishnamoorthy 



(120031 ) and, as such, are good candidates for what they called Method 2 class 
of algorithms. Essentially, this Method 2 is a both backward and forward re- 
cursive algorithm. As these authors assume {gj}j to be the dominant series 
in general (this is discussed in section Hj), the starting index for iterations, say 
k, is chosen so that is a maximum, which reduces the above mentioned 
problems. 

Obviously, the best method would be to start iterations at the index (between 
and k) which maximizes the product gjHj{x) and not only one of the terms. 
However, this is not easy to determine in general when no analytic solution is 
available. Numerical determination would be time consuming and thus would 
overcome the benefit of an optimal starting point (inasmuch as it should be 
calculated for every x). We return to this concern in section m 

Therefore, we present in the next two sections a Method 2 class of algorithms 
applied respectively to the i^-prime and /T-square cdfs, but of general use 
as far as the general form ([1]) is concerned. In section |4] we compare the two 
metods and we discuss some remaining problems and propose, in some cases, a 
simple modification which leads to an algorithm that is intermediate between 
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Method 1 and Method 2. Some examples of apphcations of these cdfs are given 
in section |5] and section |6] is devoted to some concluding remarks. 



2 i^'-prime distribution 



Technical characterizations of the i^T-prime distribution can be found in lLecoutre 
( Il999l ). This distribution is written K'^,^{a) where r are degrees of freedom 



parameters and a is a noncentrality parameter. 

Particular cases of the i^'-prime distributions are: 

a = : K'^^^iQi) = t^. (usual t distribution), 

g = oo : K'^.^{a) = t'j.{a) (noncentral t distribution), 

r = oo : Kg,^{a) = Ag(a) (lambda-prime distribution), 

g = oo, r = oo : K'^ ^{a) = N{a, 1) (normal distribution). 



This cdf has the following properties: 
Pr(ir;,(a) <x) = Pt{K^^{x) > a), 
Pr(i^;,(-a) < -x) = FT{K'g,Xx) > a), 
Pr(ir;,(a) < 0) = Pr(A;(a) < 0) = Pr(t, > a). 

Several cases are to be distinguished for the cdf: 
If a > and a; < 

FT{K'g,{a) < x)=Fr{K'^,Xa) < 0) - Pr(x < /^^.(a) < 0) 

oo 

= Frit,>a)-J2{-iyg,H,ix), 

j=0 



where 



9i 



1 r(g±^) / q \^ / a' \^ 
2r(l + |)r(f) [q + a^J [q + a^J 



Hj{x) = 7^2/(^+^2) I -^7—, - I , 



j + 1 r 
2~' 2 
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and Iz is the incomplete beta function 







If a > and x > 

Pr(ir;,(a) < x) =Pr(ir;,(a) < 0) + Pr(0 < K'g^,{a) < x) 

oo 

= Fr{t,> a) + J2g,Hj{x). 

j=0 

If a < 0, we reduce to the above cases using 
Pr(ir;,(a) <x) = l- Pi{K'^ ,{-a) < -x). 

If a = 0, we simply have 

Pr(K;,,(0) <x) = Pr(t, < x). 

Hence, the cdf of the X-prime involves the calculation of the cdf of the usual 
Student's t distribution and a series of the general form ([T]). The case where a 
and X are of a different sign is an unfavorable one, since the series is then alter- 
nate. Therefore, in the algorithm, the even and odd terms of the series should 
be accumulated separately in order to minimize the number of subtractions. 

The forward and backward recurrence relations for the cdf are straightforward. 
For the Hj's (the incomplete beta function) we have 



x'^ 



2+1 
2 



r 



■ i+r--l\ / ^2 



X 



2 



and for the Qj coefficients 



_q + j 



j+2 q+a 

j q 

q+j -2 a 



j q + a^ 

93-2 = zrr- — n —r- 9r 
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From the forward recurrence relation, it is straightforward that imposing 
9j+2 < 9j leads to j > a^{q — 2)/q — 2. Thus, the starting point for itera- 
tions, say fc, is taken as the mode of the (?j's, i.e. k = [a^{q — 2)/q] — 1, where 
[.] denotes the integer part. 

Given the parameters, Hj{x) is a decreasing function of j. Thus, when stop- 
ping the calculations at step j, the truncation error (Et) is bounded by: 

while j < k 



fc-j-l oo 

Et<Ho{x) 9i + Hk{x) 9^ 

i=0 i=k+j+l 
k—j—1 oo 

<Ho{x) Y 9i + Ho{x) Y 9i 

i=0 i=k+j+l 
k+j 

<Ho{x) 1- Y 9^ 

i=k-j 



and when j > k 



Et < Hk+j{x) 



k+j 



j=0 



Benton and Krishnamoortliyl ( 2003 ) used Et < I — Y.i=k-j 9i instead of ([2]), so 
that the calculation of Hq{x) was avoided. We think that the relaxation of the 
stopping rule compensates for the increased execution time due to one call to 
the incomplete beta function. 



Stopping rule: Stop when Et becomes lower than a predetermined absolute 
error bound. 



3 /^-square distribution 



Techn ical characterizations of the i^-square distribution can be found in lLecoutre 



(Il999l ). This distribution is written Kp^^^^{a'^) where p, q, r are degrees of free- 
dom parameters and is a noncentrality parameter. 

Particular cases of the ii'-square distribution are: 

a = : Kpg^^{0) = Fp^r (usual F distribution), 

g = oo : Kp,^ j.{a^) = Fp^^^o?) (noncentral F distribution). 
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r = oo : Kpg^^{a^) = Ap^^a'^) (lambda-square or alternate chi-square distri- 
bution) , 

g = oo,r = oo : Kp,^^^{a'^) = (l/p)Xp(a^) (noncentral chi-square distribu- 
tion) . 



For the cdf, s = 1 in ([T]) and we simply have 



PT{Kl^^,{a')<x) = J2g,H,{x), 

j=0 



with 



9, - ' 



r(j + l)r(f) \q + ay \q + a\ 



and 



Hj{x) = Ipx/(r+px) ( 2 + 2 ) ' ^ > 0' 



The recurrence relations for the incomplete beta function now write 



r{p/2 + r/2+j) ( px y'^^^ ( r V^' 



\r + px J \1 



^ r(p/2 + J + l)r(r/2) \r+pxj \r + px ^ 
^ ^T{p/2 + r/2 + j-l) ( px . 



r(p/2 + j)r(r/2) \r + pxj \r + px 

and for the Qj coefficients 



g/2 + 3 g' 

j + 1 q + 



The coefficients Qj are the probabilities of obtaining the value j for a variate 
following a negative binomial distribution with parameters q/ (q + a^) and q/2. 
The mode is [a^{q — 2)/(2g)] (where [.] denotes the integer part), hence the 
starting index for iterations. The stopping rule is the same as in the case of 
the i^-prime. 
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4 Limitations and possible improvements 



Drawbacks of Method 1 algorithms (possible underflows and an exceeding 
number of iterations) led to the development of Method 2 algorithms. In 
Method 1, the iterations start at index j = which maximizes Hj{x), while 
in Method 2 they start at index j = k which maximizes gj. 

In Tables 1 and 2, we compare the number of iterations for these two methods, 
as applied respectively to the i^-prime and i^'-square cdfs for various situations 
and with a precision set to 10~^. The ten flrst examples in Table 2 correspond 



to those in Table 1 of iBenton and Krishnamoorthyl (120031 ) for the distribution 



of the square of the sample multiple correlation coefficient. More precisely, 
the correspondence is as follows: the sampling distribution of the multiple 
correlation R^, involving a sample of n independent observations from a m- 
variate normal population with square multiple correlation coefficient p^, is 
such that 

n-m R'^ ^ ^ ( 1^ ^ 



_ I — R2 ' ^ rn-L,n~L,n-m IV" ^ I _ p2 



One last example has been added, corresponding to = 0.33, = 0.50, 
m = 5, n = 100. 

Of course, as soon as both methods attain at least 2k iterations, they return 
identical results as the same terms are summed up (for instance, this is the 
case in the fifth example of Table 1). As can be seen, relatively to Method 1, 
Method 2 can indeed reduce the number of iterations by a great amount: more 
than 60% in m ost of Table 2 examples. When the p recision criterion is turned 



to 10 (as in iBenton and Krishnamoorthyl . 120031 ). the gain diminishes, nat 



urally, but is stil about 40%. However, it is also obvious that Method 2 is not 
systematically better. This can be seen in the last example of Table 2, and is 
especially clear in the case of the i^'-prime distribution (Table 1) where the 
number of iterations can be increased by more than 1000%. It's not surprising 
that Method 1 performs better when the noncentrality parameter is small, but 
it also happens when this parameter is higher, as in the case of the second 
and third examples of Table 1. 

More generally, wheneveriffc(x) tends to zero quickly with respect to k, Method 1 
algorithms perform better than Method 2 algorithms, because only the first 
terms of the series ([1]) contribute significantly to the sum. And when Hk{x) is 
still close to Ho{x), Method 2 is likely to be quasi optimum. 

Furthermore, with Method 2, it can happen that the initial recurrence incre- 
ment for the Hj^s is too small with respect to the machine limit so that a 
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Table 1 

Comparison between Methods 1 and 2 for the iT-primc cdf algorithm. Mi is num- 
ber of iterations for Method i. Gain is the gain, in percentage, of Method 1 over 
Method 2, a negative number indicates Method 1 performs better. 



X 


q 


r 


a 


PT{K'^{a) <x) 


Ml 


M2 


gain 


1 


5 


20 


10 


0.0007 


9 


119 


-1222% 


11 


5 


20 


50 


0.0017 


332 


2999 


-803% 


40 


50 


50 


50 


0.0612 


2892 


4799 


-60% 


40 


50 


5 


50 


0.4277 


4387 


4799 


-9% 


50 


50 


20 


30 


0.5242 


1844 


1844 


0% 


40 


100 


5 


50 


0.1783 


3644 


3224 


12% 


45 


100 


10 


40 


0.6377 


2499 


2084 


17% 


65 


1000 


15 


50 


0.8820 


3007 


1052 


65% 



Table 2 

Comparison between methods 1 and 2 for the iC-square cdf algorithm. Mi is num- 
ber of iterations for Method i. Gain is the gain, in percentage, of Method 1 over 
Method 2, a negative number indicates Method 1 performs better. 



X 


V 


9 


r 






Ml 


M2 


gain 


36 


2 


20 


18 


46.667 


0.7771 


57 


57 


0% 


0.19444 


4 


11 


7 


4.7143 


0.0126 


3 


3 


0% 


288 


3 


99 


96 


891 


0.4382 


618 


598 


3% 


972 


11 


1199 


1188 


10791 


0.4339 


5953 


1844 


69% 


795.2 


5 


999 


994 


3996 


0.4661 


2246 


796 


65% 


475.2 


5 


599 


594 


2396 


0.4562 


1390 


624 


65% 


715.2 


5 


899 


894 


3596 


0.4643 


2033 


756 


63% 


202.909 


11 


1499 


1488 


2248.5 


0.4297 


1252 


420 


66% 


216.545 


11 


1599 


1588 


2398.5 


0.4319 


1331 


433 


67% 


223.364 


11 


1649 


1638 


2473.5 


0.4330 


1371 


439 


68% 


11.6978 


4 


99 


95 


99 


0.0063 


47 


90 


-91% 



zero is returned and recurrence is impossible: e.g., for the i^-square cdf, this 
increment term is lower than 10~^°^ when p = 10, g = 20, r = 30, c? — 500 
and X — 0.1. So, both methods are subject to underflows, whether through 
the g'j's (Method 1) or whether through the iij{x)'s (Method 2). 

A tempting solution, when Hk{x) is too small, would be to choose a modi- 
fied index, say k', such that Hk'{x) reaches a predetermined value (i.e. one 
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markedly above the machine hmit); unfortunately, such an inversion of the 
beta cdf involves an iterative procedure and so is to be discarded on grounds 
of speed efficiency. An alternative solution is to lower k by some amount. This 
amount will depend, among others, on x. Given the parameters and j, Hj{x) 
is an increasing function of the argument of the incomplete beta function, 
say z, that is itself a function of x. The lower H.j{x). the more k has to be 
lowered. Thus, for sake of simplicity and as a first attempt, we propose to use 
the identity function on z so that k is simply lowered by multiplying it by the 
argument of the incomplete beta function {px/ {px + r) for the i^-square and 
x'^/ (x^ + r) for the prime). 

This modification avoids underfiows in the preceding example. Furthermore, 
it sometimes permits to reduce the number of iterations. Thus, it could be 
introduced as soon as Hk{x) is below some arbitrary threshold (e.g., when 
Hk{x) I Hq{x) < 0.01) and not only when a true underflow occurs. For instance, 
for the distribution -ft'io so 2oo(^00); when x takes the values 35, 30, 20, and 
10, the number of iterations is always 390 (for a precision of 10~^), while 
when turning to the modified starting index, it drops respectively to 309, 
291, 243 and 163. In the first example of Table 1, the modification leads to 
9 iterations (instead of 119 with the unmodified version), just as Method 1. 
Obviously, it is not relevant when Hj{x) diminishes rather slowly with j, 
which is the case for the Table 2 examples, except the last one. In that last 
example, the modification leads again to the same number of iterations as 
Method 1 (47). Another example of reduction of iterations, concerning the 
X-prime distribution, is given in the next section. 

Therefore, we could finally suggest the following tactic: 

(1) Calculate goHQ{x) and gkHk{x) and choose as the starting index (0 or k) 
the one which leads to the maximum. 

(2) If is chosen and recurrence is impossible, try k. 

(3) If k is chosen and recurrence is impossible (or if Hk{x) is very small 
compared to Ho{x)), multiply it by the argument of the incomplete beta 
function (this can be repeated). 



5 Examples of applications 

5.1 Predictive probabilities 

Suppose a simple two-sample experiment was designed to compare a new 
drug with a placebo. For this purpose, the investigators used a two-sample t 
test with equal numbers of subjects rii — 10 in each group, in order to test 
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Ho : 5 = against the alternative Hi : 5 > 0. Let us denote by mi the 
sample mean difference in the current data and by Si the pooled estimate of 
the common standard deviation a. The observed t statistic was Ti = 1.10, 
hence p = 0.143 (one-tailed). 



Let us consider a conjugate prior for (/x, cr^), characterized by 
no " \ go 



^i\a^ ~ iV(mo, —a^) and ~ sl^^"' 



Lecoutrd (119991 ) demonstrated that the predictive distribution of the t test 



statistic for n future observations is a i^-prime distribution 



1 + ^ K,2n-. "7=^ Where To = ^ ^2. 



As a particular case, here the prior is the posterior distribution from the 
available data, starting with the usual noninformative prior p(/x,cr^) oc 1/cr^, 
hence mg = mi, sq = Si, = ni and go = 2ni — 2. We get for a replication 
(n = ni = 10) the predictive distribution 



that only depends on the observed t test statistic Ti = 1.10 and its associated 
degrees of freedom. 

We can compute the probability of finding a positive mean in a replication 
(Killeen's Prep) as 



^^(^ki8(^) >o) = 0.777. 



We can also compute the predictive probability of a significant replication. 
For instance we find the probability 0.334 that the one-tailed p value will be 
less than 0.05 (i.e. t > 1.734): 



Pr(V2K[,^,,{^) > 1.734) = Pr^K[,^,,{^) > 1.226) = 0.334 



The investigators generally largely underestimate this probability: see lLecoutre and Rouanet 



( I1993I ) , iLecoutrd (120001 1 . Note that there is also a non negligible probability of 
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finding a significant effect in the negative direction: Pr{\/2 K[g ^g(1.10)/-\/2) < 
-1.734) = 0.027. 

An example of applicati on to sample size de termination in clinical trials from 



a pilot study is given in iGrouin et all ( 120071 ) 



The predictive probabilities for F ratios and usual standardized effect size 
measures in ANOVA designs can be computed from the i^'-square distribu- 
tion. Let us consider for instance the data of a one-way design with g groups 
of equal sample sizes hq. Let Fq, the observed ANOVA F ratio for the overall 
comparison of the means. Assuming before the experiment the usual non infor- 
mative prior, the posterior predictive distribution for the F ratio i n a future 



exper iment with equal sample sizes n is a ii"-square distribution (ILecoutre 



19991): 



1 + ^ 




5.2 Distributions of correlation coefficients 



Other applications of the if-prime and ii'-square distributions are exact infer- 
ences for correlation coefficients. The sampling distribution of the correlation 
coefficient r, involving a sample of n independent observations from a bivariate 
normal population with population coefficient p, is such that 



so that exact tests and confidence limits for p can be computed from the K- 
prime cdf. For instance, when n = 250 and assuming p = 0.80, the probability 
to observe a sample r lower than 0.75 is 0.0227. If this is calculated using 
the standard Method 2 algorithm with a precision of 10^^^, 860 iterations 
are required, whereas only 595 are needed with the modification proposed in 
section H] (if the precision is set to 10~^, the numbers of iterations become 
respectively 546 and 502). 

Moreover, in the Bayesian framework, assuming a uniform prior for p, the 
posterior distribution is also a i^-prime distribution. 

The sampling distribution of the multiple correlation has been presented 
in section HI 
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6 Concluding remcirks 

We presented an algorithm for two Bayesian predictive distributions involved 
in the designing of experiments and in the computation of "the probability of 
replication" under usual normal models. Furthermore, we used these distribu- 
tions to compare two available methods for computing cdfs that are expressed 
as discrete mixtures of continuous distributions (the incomplete beta function 
in our case). If in many cases the two methos are likely to perform equally 
well, it appeared that none of them is systematically better, depending, among 
others, upon the particular functions involved in the cdfs. and that they both 
suffer a comparable problem: due to underflows, the starting index of itera- 
tions can be such that recurrence is impossible. Method 2 was proposed to 
avoid Method 1 underflows, and here we proposed to manage Method 2 un- 
derflows by lowering the starting index by a quantity which is the argument of 
the incomplete beta function. This is a tentative solution that can be viewed 
as a crude approach to the problem of flnding the optimum starting index. 
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